Erik Kusch
11
Erik Kusch
Senior Engineer
Machine Readable Nature Research Group (MANA)
Department of Research and Collections
Natural History Museum
University of Oslo
Erik Kusch
GLOBAL BIODIVERSITY
INFORMATION
FACILITY
GBIF –Occurrence Data for Your Research
Erik Kusch
2
What is GBIF?
Erik Kusch
3
ï‚§Intergovernmental network and
research infrastructure
ï‚§Provides anyone, anywhere, free and
open access to data about all types of
life on Earth
ï‚§Voluntary collaboration through
Memorandum of Understanding
(MoU)
ï‚§Participant nodes, Secretariat in
Copenhagen, Denmark
What is GBIF?
Erik Kusch
4BY THE NUMBERS | 14th November 2023
64
Country
Participants 44
Organization
al
Participants 9 532
Peer-review papers
using data
2 615 608 802
Species occurrence records 91 030
Datasets
2 643
Publishers
>119 billion
Records downloaded per month in 2022
Erik Kusch
5BY THE NUMBERS | 14th November 2023
64
Country
Participants 44
Organization
al
Participants 9 532
Peer-review papers
using data
2 615 608 802
Species occurrence records 91 030
Datasets
2 643
Publishers
>119 billion
Records downloaded per month in 2022
Erik Kusch
6Data From the GBIF Network | 14th November 2023
Erik Kusch
7BY THE NUMBERS | 14th November 2023
64
Country
Participants 44
Organization
al
Participants 9 532
Peer-review papers
using data
2 615 608 802
Species occurrence records 91 030
Datasets
2 643
Publishers
>119 billion
Records downloaded per month in 2022
Erik Kusch
8Data Sets in GBIF
Erik Kusch
9BY THE NUMBERS | 14th November 2023
64
Country
Participants 44
Organization
al
Participants 9 532
Peer-review papers
using data
2 615 608 802
Species occurrence records 91 030
Datasets
2 643
Publishers
>119 billion
Records downloaded per month in 2022
Erik Kusch
10 Data From the GBIF Network | 14th November 2023
Map updated 2023-11-14
https://www.gbif.org/the-gbif-network
Erik Kusch
11 BY THE NUMBERS | 14th November 2023 - Norway
235
Peer-review papers
using data (co-author
from Norway
49 347 021
Species occurrence records
(published from)
426
Datasets
(published from)
38
Publishers
(from Norway)
Erik Kusch
12
What Data is in GBIF?
Erik Kusch
13 Evidence About Where Species HAVE Lived & When
Digitized
specimens
Observations
Literature
Remote-sensing Environmental
DNA
Common
standards
(DwC)
Data publishing
and indexing
Data discovery and use
Erik Kusch
14 Data Models in GBIF
Data richness levels
supported by Gbif
Dataset description,
taxonomic/geographic/temporal scope
Dataset metadata
M
List of taxa
regional or thematic (e.g. invasive, medicinal)
Species checklists
C
Species occurrences and sampling events
dates, coordinates, sampling effort / protocol,
abundance
Sampling-event data
SE
Species occurrences
dates, coordinates, basis of record
Occurrence-only data
O
Erik Kusch
15 Sources of GBIF Data: Digitized Museum Collections
Erik Kusch
16 Sources of GBIF Data: Taxonomic Literature
Data
liberation
Erik Kusch
17 Sources of GBIF Data: DNA Derived Occurrences
Erik Kusch
18 Sources of GBIF Data: Peer-Reviewed Publications
Erik Kusch
19 Sources of GBIF Data: Citizen Science Observations
Erik Kusch
20
Using GBIF Data
Erik Kusch
21 GBIF: Multiple-Purpose Data Publishing Services
Research
data portals
portal
Bio-Collections
& ecology datasets
Environment
directive
reporting
Erik Kusch
22 Supporting Research and Sustainable Development
ï‚§Conservation
- Protected areas
- Threatened species
- Invasive species risk
ï‚§Food Security
- Crop wild relatives
-
In situ
,
ex situ
conservation
of genetic diversity
- Fisheries planning
ï‚§Climate change
- Modelling impacts on
species ranges
- Adaptation strategies
- Mitigation benefits, risks
ï‚§Human health
- Disease risk based on
occurrence of vectors, hosts,
reservoirs
- Medicinal plants
- Hazards e.g. snakebite
Erik Kusch
23 Biodiversity Evidence for Research and Policy
Erik Kusch
24
GBIF Data Considerations
Erik Kusch
25 Taxonomic Biases
Image: FL Fawcett in Wheller Ann. Entomol. Soc. Am. 1990
Troudet et al. Nature Scientific Reports 2017
1200 mill.
animals
300 m
plants
20 m
fungi
16 m
bacteria 0,04 m
virus
Erik Kusch
26 Spatial Bias
Observations are:
ï‚•where people go
ï‚•driven by visibility
Erik Kusch
27 Temporal Bias
1832-18372017-2022
Erik Kusch
28 Data Issues and Flags
ï‚•Suite of flags for determining data
quality
ï‚•The ones to look for especially
ï‚•Taxon match fuzzy
ï‚•Taxon match higherrank
ï‚•Coordinate rounded
ï‚•Coordinate uncertainty in metres
! 0;0 !
Erik Kusch
29 (Art and Science) of Taxonomy
Taxonomic homonyms: when filtering based on scientific name, check if all the results are in the same part of taxonomic tree
(at least kingdom)
Cuspidaria cuspidata (Olivi, 1792) Cuspidaria cuspidata (M. Bieb.) Takht.
Erik Kusch
30 Data Size
The size of the dataset you download can be substantial, depending
on the extent of your query. Ensure you have the necessary storage and
computational resources to handle the data.
Consider using BigQuerry or Apache Spark for interactions with really
big data.
Only query what you need.
Erik Kusch
31
Accessing GBIF Data
Erik Kusch
32 BY THE NUMBERS | 14th November 2023
64
Country
Participants 44
Organization
al
Participants 9 532
Peer-review papers
using data
2 615 608 802
Species occurrence records 91 030
Datasets
2 643
Publishers
>119 billion
Records downloaded per month in 2022
Erik Kusch
33 How are Data Indexed in GBIF?
GBIF translates traditional nomenclature into Operational Taxonomic Units
(OTUs)
PhyloCode
Domain (Eukarya)
Phylum (Chordata)
Order (Primates)
Genus (Homo)
Species (Homo sapiens)
Family (Hominidae)
Class (Mammalia)
Kingdom (Animalia)
Erik Kusch
34 OTUs and the GBIF Backbone
Species Hypothesis (SH) numbers [DOI]
BIN DEF0002SH ABC0001
GBIF
backbone
taxonomy Barcode Identification Number (BIN)
Erik Kusch
35 Machine-Readability Requires Persistent Identifiers
The purpose of identifiers is
… to name things
… making it possible to refer to them
ï‚§To uniquely identify something it needs a persistent identifier, a PID.
A Persistent Identifier is globally unique,persistent, and resolvable“.
ï‚§A PID is resolvable when it allows both human and machine users to access an object or its representation, and its
Kernel
Information.
ï‚§Kernel Information is a structured record that contains information (metadata) about the referred object, such as a pointer to the location where the data for the object can be
found.
Erik Kusch
36
FAIR data is about
machine-readable
data
researchers & museums need to do more than simply post their data on the web for it to be re-usable.
Erik Kusch
37 GBIF & Identifiers
https://www.gbif.org/occurrence/1095052193
Dataset
: Vascular Plant
Herbarium, Oslo (O) UiO
Publisher
: University of Oslo
Catalogue number
: 2007334
Erik Kusch
38
•GBIF trawling for data via GBIF
Data Portal
•Search can be refined via filters Dataset
metadata
Species
checklists
Sampling-
event data
Occurrence
data
Discovering Data in GBIF –The Data Portal
Erik Kusch
39
Download through the GBIF Data Portal is a three-step process:
1. Select desired data
2. Stage download & wait for GBIF to finish processing
3. Download final product
Downloading Data in GBIF –The Data Portal
Erik Kusch
40
Discovery & Download takes four R function calls:
1. Initial Data Search: occ_search(…)
2. Asynchronous download:
1. occ_download(…)
2. occ_download_get(…)
3. occ_download_import(…)
Programmatic Data Discovery & Download
occ_initial
Erik Kusch
41 Long-Term Download Accessibility
We store all download files as long as
possible.
The download metadata page will always
resolve, but the file itself might be
removed in the future.
We strive to store all downloads, but
prioritize downloads that have been cited.
Erik Kusch
42
Accrediting GBIF Data
Erik Kusch
43 BY THE NUMBERS | 14th November 2023
64
Country
Participants 44
Organization
al
Participants 9 532
Peer-review papers
using data
2 615 608 802
Species occurrence records 91 030
Datasets
2 643
Publishers
>119 billion
Records downloaded per month in 2022
Erik Kusch
44 How to Cite Data Mediated by GBIF
1. Download data from GBIF.org
2. and receive recommended citation with a download DOI
3. Cite the DOI in published research or other work
Example
: GBIF.org (9 November 2021) GBIF Occurrence Download https//doi.org/10.15468/dl.xxxxxx
Erik Kusch
45 Downloads and Datasets are Assigned DOIs
Citing the data download DOI will resolve to the dataset DOIs assigned for each dataset
contributing data records to the download set.
This way, all data publishers contributing data records will be accredited!
Erik Kusch
46
Source dataset #1
Source dataset #2
Source dataset #3
GBIF download
Publish
datasets
in GBIF
Final state of data
Dataset DOIs Download DOI Bibliographic DOI
Analyze
& publish
Process &
archive
institutionID
collectionID
Filter &
download
materialSampleID
identifiedByID
Erik Kusch
47
Source dataset #1
Source dataset #2
Source dataset #3
GBIF download
Publish
datasets
in GBIF
Final state of data
Dataset DOIs Download DOI Bibliographic DOI
Analyze
& publish
Process &
archive
institutionID
collectionID
Filter &
download
materialSampleID
identifiedByID
Erik Kusch
48
Source dataset #1
Source dataset #2
Source dataset #3
GBIF download
Publish
datasets
in GBIF
Final state of data
Dataset DOIs Download DOI Bibliographic DOI
Analyze
& publish
Process &
archive
institutionID
collectionID
Filter &
download
materialSampleID
identifiedByID
Erik Kusch
49
ROR for institutions
ORCID for curators
DOI for datasets
(GRSciColl UUID for collections)
will enable the linking of museum
collection specimens to scientific
litterature and scientific actors
(authors, curators, etc)
Digital Object Identifier (DOI)
Open Researcher and Contributor ID (ORCID)
Research Organisation Registry (ROR)
Erik Kusch
50
THANK YOU
www.gbif.or
g
Erik Kusch |
erik.kusch@nhm.uio.no
Senior Engineer
Ma
chine Readable Nature Research Group - MANA
Department of Research and Collections
Natural History Museum
University of Oslo